Fix MQTT reconnection loop in SatelliteController after broker disconnect#180
Open
daGrumpf-bxp wants to merge 1 commit into
Open
Conversation
The listen() loop had two bugs preventing reconnection: 1. except Exception was placed before except aiomqtt.MqttError, making the MqttError handler dead code (Exception catches all) 2. Both except blocks had 'break', exiting the while True loop on first disconnect, so the client never reconnected Fix: swap except order (MqttError before Exception), remove the breaks on connection errors. The existing asyncio.sleep(5) handles backoff.
4db7d46 to
f464f92
Compare
Member
|
Thank you for bringing this up as it was also anoying myself for some time and I never managed to fix it. I will look into it, and merge, once I have time. Probably next weekend |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the MQTT auto-reconnection logic in
SatelliteController.listen(). Currently, when the MQTT broker (e.g. mosquitto) restarts or has a brief outage, cbpi never reconnects and stays offline until a manualsystemctl restart craftbeerpiis performed. This is critical for users who control actors via MQTT (e.g. ESP-based relay boards for fermentation control) since cbpi stops sending commands entirely.Root cause
In
cbpi/controller/satellite_controller.py, thelisten()method has two related bugs:Exception ordering inverted:
except Exceptionis placed beforeexcept aiomqtt.MqttError. SinceMqttErrorinherits fromException, theMqttErrorhandler is dead code — Python evaluatesexceptclauses in order, andExceptioncatches everything first.breakon connection errors: Bothexcept Exceptionand (the unreachable)except aiomqtt.MqttErrorblocks containedbreak, which exits thewhile Truereconnection loop. The very first disconnect kills the loop permanently.Combined, this means:
aiomqttraises anMqttError(subclass ofException)except Exceptionblock catches it, logs an error, andbreaks out of the looppublish()calls fail silently with[code:4] The client is not currently connected.Reproduction
Tested on Raspberry Pi OS Bookworm, cbpi4 4.7.5, aiomqtt 2.5.1, mosquitto broker on localhost.
Before the fix: cbpi disappears from
ssoutput and never reappears. Log shows a singleMQTT General Exception: Disconnected during message iterationthen silence, followed by floods ofFailed to push data via mqtt: [code:4] The client is not currently connected.warnings.After the fix: cbpi logs
MQTT disconnected: ... Reconnecting in 5severy 5 seconds while the broker is down, then automatically reconnects when mosquitto is back up. TheESTABconnection reappears insswithin seconds of the broker recovery.Changes
Single file modified:
cbpi/controller/satellite_controller.pyasync def listen(self): while True: try: async with self.client as client: await client.subscribe("#") async for message in client.messages: for topic_filter in self.topic_filters: topic = topic_filter[0] method = topic_filter[1] if message.topic.matches(topic): await method(message) except asyncio.CancelledError: # Cancel self.logger.warning("MQTT Listening Cancelled") break - except Exception as e: - self.logger.error("MQTT General Exception: {}".format(e)) - break except aiomqtt.MqttError as e: - self.logger.error("MQTT Exception: {}".format(e)) + self.logger.warning("MQTT disconnected: {}. Reconnecting in 5s".format(e)) + except Exception as e: + self.logger.error("MQTT General Exception: {}. Reconnecting in 5s".format(e)) await asyncio.sleep(5)Three changes:
except aiomqtt.MqttErroris moved BEFOREexcept Exceptionso it actually fires.breakis removed from both connection-error handlers, allowing thewhile Trueloop to retry.MqttErrorbecomes awarning(it's an expected condition during broker outages, not an error), and both messages now clearly state that a reconnect is being attempted.The existing
await asyncio.sleep(5)provides a 5-second back-off between retries.asyncio.CancelledErroris left untouched — itsbreakis correct since it's the intended shutdown signal.Testing
Manually tested on the reproduction setup above with:
In all cases cbpi recovers automatically within
keepalive + 5safter the broker is reachable again.Backward compatibility
No API or behavior changes other than the fix itself. The log line numbers shift slightly which may affect anyone grepping for
satellite_controller.py:XXpatterns, but log messages remain searchable by content.